Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb33. Automating phonetic measurement: The case of voice onset time
نویسندگان
چکیده
We present an architecture for locating phonetic events accurately in time, and for measuring time differences between nearby events, using Voice Onset Time (VOT) as a case study. Although VOT remains a central concern in the field, phoneticians' VOT measurements generally continue to rely on human judgment. This requires significant labor, makes even large laboratory experiments onerous, and prevents the field from taking full advantage of the millions of hours of digital speech now becoming available. Our algorithm accurately automates VOT measurement, by combining HMM forced alignment for determining approximate stop boundaries with paired burst and voicing onset detectors. Each detector is a frame-level max margin classifier operating on the scale-space projection of a small number of relevant acoustic features. On a large set of clean lab speech, this system has a mean absolute error (relative to human annotation) of only 2.8 ms, with 98% of errors <10 ms. On a subcorpus independently annotated by two of the authors, the system agreed with the two human annotators as well as they agreed with one another (1.49 ms vs 1.50 ms). Promising results on other datasets will be reported. The system will be released as opensource software.
منابع مشابه
Speech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb27. Pharyngeal constriction in English diphthong production
This study tests the hypothesis that the acoustic difference between [a] in English diphthongs (e.g. [a] in "pie'd") and its corresponding monophthong (e.g. [a] in "pod") results from the same pharyngeal gesture being truncated by the following palatal glide in the diphthongal environment. Production data were collected with real-time MRI and have been analyzed using the direct image analysis (...
متن کاملSpeech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb53. One small step for (a) man: Function word reduction and acoustic ambiguity
"That's one small step for man, one giant leap for mankind." Neil Armstrong insisted for years that his famous quote upon landing on the moon was misheard, and that he had said "one small step for a man." This controversy has continued, as examinations of the sound files of his transmission have yielded mixed opinions about whether he produced a. The disagreement stems partly from the fact that...
متن کاملSpeech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb39. On distinguishing articulatory configurations and articulatory tasks: Tamil retroflex consonants
Speech production can be described in multiple coordinate frames: articulatory configurations, gestural tasks, and acoustic patterns. Examination of the achievement of retroflex stops and liquids in Tamil suggests that we must consider separately the gestural task of apical post-alveolar constriction and the articulatory maneuver to achieve the task. The maneuver of the tongue during retroflex ...
متن کاملSpeech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb15. Articulatory overlap in English syllables with postvocalic /ɹ/
*Corresponding author's address: Linguistics, University of Southern California, 3601 Watt Way, GFS 301, Los Angeles, CA 90089-1693, [email protected] In General American English (GAE), only two full vowels [ɑ, ɔ] occur in syllables ending in [ɹ] plus a non-coronal consonant, e.g. , . An articulatory study of rhotic production by three speakers of GAE was conducted using real-time s...
متن کاملSpeech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb47. A comparative cross-linguistic study of vocal tract shaping in sibilant fricatives in English, Serbian and Mandarin using real-time magnetic resonance imaging
An articulatory study of sibilant fricatives is described, with the goal of describing variability in lingual articulation across languages. Realtime Magnetic Resonance Imaging (rtMRI) data were collected from three speakers each of English and Mandarin and two speakers of Serbian and reconstructed at a rate of 22.4 frames per second. Parallel acoustic data were also collected and subsequently ...
متن کامل